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ABSTRACT 

In November 2019 the nearby single, isolated DQ-type white dwarf LAWD 37 (LP 145-141) aligned closely with a distant 
background source and caused an astrometric microlensing event. Leveraging astrometry from Gaia and followup data from the 
Hubble Space Telescope we measure the astrometric deflection of the background source and obtain a gravitational mass for 
LAWD 37. The main challenge of this analysis is in extracting the lensing signal of the faint background source whilst it is buried 
in the wings of LAWD 37’s point spread function. Removal of LAWD 37’s point spread function induces a significant amount 
of correlated noise which we find can mimic the astrometric lensing signal. We find a deflection model including correlated 
noise caused by the removal of LAWD 37’s point spread function best explains the data and yields a mass for LAWD 37 of 
0.56 + 0.08Mo. This mass is in agreement with the theoretical mass-radius relationship and cooling tracks expected for CO 
core white dwarfs. Furthermore, the mass is consistent with no or trace amounts of hydrogen that is expected for objects with 
helium-rich atmospheres like LAWD 37. We conclude that further astrometric followup data on the source is likely to improve 
the inference on LAWD 37’s mass at the ~ 3 percent level and definitively rule out purely correlated noise explanations of the 
data. This work provides the first semi-empirical test of the white dwarf mass-radius relationship using a single, isolated white 
dwarf and supports current model atmospheres of DQ white dwarfs and white dwarf evolutionary theory. 
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1 INTRODUCTION con et al. 2010) and the white dwarf upper mass limit (~ 1.4Mo) 
underpins our understanding of the progenitors of type Ia supernovae. 


Comores een (0D) cone while diya iseate tiie inal eyolunenany Moreover, it is vital when using cooling white dwarfs to date stellar 


stage for the vast majority of stars (< 8Mo). They are expected to 
consist mainly of an electron-degenerate core, surrounded by a thin 
envelope of non-degenerate hydrogen and helium (e.g., Tremblay 
& Bergeron 2008). Their mainly degenerate nature means they are 
expected to follow a mass-radius relationship (MRR) as they evolve 
and cool. The white-dwarf MRR is important to many areas of as- 
trophysics. It is typically relied upon to calculate the mass of white 
dwarfs from photometric or spectroscopic measurements (e.g., Fal- 
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populations in globular clusters (e.g., Hansen et al. 2002). 


Toa first-order approximation, in a white dwarf, the inward force of 
gravity is balanced by the outward pressure of the electron-degenerate 
gas, resulting in a white dwarf’s radius being inversely proportional 
to the cube root of its mass (Chandrasekhar 1935). Today, detailed 
evolutionary cooling models including specific degenerate core com- 
positions, the mass of the non-degenerate interior hydrogen layers, 
and the effects of finite temperature are used to calculate theoretical 
MRRs (e.g., Bédard et al. 2020). Despite their sophistication, the- 
oretical MRRs are forced to rely on assumptions about the interior 
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structure of the white dwarf. This is because the masses of the grav- 
itationally stratified non-degenerate interior hydrogen, helium, and 
CO layers are poorly constrained by observations of the white dwarf’s 
surface. This is particularly important in the case of the mass frac- 
tion of the interior hydrogen layer (qy). Depending on temperature 
and whether a ’thick’ or ’thin’ hydrogen layer is assumed, theoretical 
MRRs can vary by 1-15 percent (Tremblay et al. 2017). 

For hydrogen-rich white dwarfs (DA type), a thick hydrogen layer 
is assumed (gy = 10~*) which is the estimated maximum hydrogen 
mass that post-asymptotic-giant-branch evolution models predict for 
a0.6Mo white dwarf after residual nuclear burning (Iben & Tutukov 
1984). Helium-rich white dwarfs (non-DA type) are either created 
with hydrogen deficient atmospheres or their hydrogen is hidden 
beneath their surface (Tremblay & Bergeron 2008). For these white 
dwarfs, a thin hydrogen layer is assumed (qy = 107 10) and represents 
only trace amounts of hydrogen. 

Despite the white dwarf MRR’s importance, observational tests 
of the relationship are challenging. Truly direct tests of the MRR 
are rare. This is only possible in cases where both the white dwarf’s 
mass and radius can be determined independently of any atmospheric 
models. Such a direct test has been performed in the case of eclipsing 
binary systems (e.g., Parsons et al. 2016). However, the configuration 
of these systems implies the white dwarf is always post-common 
envelope and has therefore evolved differently to an isolated white 
dwarf. Otherwise semi-empirical tests of the MRR which depend on 
white dwarf atmospheric models, are possible. 

By fitting atmospheric models to broad-band photometry and spec- 
troscopy (e.g., Giammichele et al. 2012), a white dwarfs’ atmospheric 
parameters (Tes, log g), and its solid angle can be measured. Combin- 
ing the solid angle with distance information (from parallax), allows 
the photometric radius of the white dwarf to be measured (e.g., Kilic 
et al. 2020). The photometric radius can then be combined with log g 
to infer the mass of the white dwarf, and the MRR can be tested 
(Schmidt 1996; Bédard et al. 2017; Bergeron et al. 2019). The prob- 
lem with this approach, however, is that both the mass and the radius 
are entirely derived from the atmospheric models and it is often dif- 
ficult to disentangle observed features of the MRR from systematic 
effects and degeneracies in the atmospheric models (Tremblay et al. 
2017). For robust semi-empirical tests of the MRR, the photometric 
radius needs to be combined with a mass determination independent 
of atmospheric models. 

For white dwarfs, it is also possible to obtain direct mass informa- 
tion from gravitational redshift measurements (Einstein 1916). The 
difficulty with this technique is that, for white dwarfs, the shift in 
the absorption lines due to gravitational redshift is of a similar size 
to, and degenerate with, the Doppler shift caused by radial motion. 
This means that the observed shift in spectral lines is a combina- 
tion of both effects and the gravitational redshift signal can only be 
isolated if the radial velocity of the white dwarf is precisely known. 
Determining the radial velocity of a white dwarf is possible when it 
is in a binary system by taking measurements of its companion (e.g., 
Joyce et al. 2018). The MRR can also be studied using gravitational 
redshift for groups of white dwarfs that are co-moving (e.g., Pasquini 
etal. 2019), or by averaging over random radial motions (e.g., Falcon 
et al. 2010; Chandra et al. 2020). 

The most precise direct masses (and semi-empirical tests of the 
MRR) for white dwarfs that are not post-common envelope, come 
from astrometric measurements of visual binary systems. Fig. 1 
shows the MRR for nearby white dwarfs in visual binaries with direct 
mass determinations; 40 Eri B (Bond et al. 2015), Procyon B (Bond 
et al. 2017a), and Sirius B (Bond et al. 2017b). Fig. 1 also shows 
Stein 2051 B, which also happens to be in a visual binary system 
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Figure 1. MRRs for nearby white dwarfs in visual binary systems. The masses 
for 40 Eri B, Procyon B, and Sirius B were determined from astrometric 
measurement of their orbits (Bond et al. 2015, 2017a,b). The error bars on 
Procyon B’s mass and Sirius B’s radius are too small to be seen. The mass for 
Stein 205 1B was obtained via astrometric microlensing (Sahu et al. 2017). For 
comparison, the red curve shows the theoretical MRR for zero-temperature 
white dwarfs with an iron (Fe) core (Hamada & Salpeter 1961). Theoretical 
MRRs for CO core white dwarfs were obtained from the cooling models of 
Bédard et al. (2020). Figure reproduced from Fig. 4 in Bond et al. (2017b). 


but its mass has been determined by astrometric microlensing (Sahu 
et al. 2017). All of these objects have photometric radius determina- 
tions, and are in agreement with the theoretical MRRs (Bédard et al. 
2020). A semi-empirical test of the MRR for a single and isolated 
white dwarf has yet to be performed. 

Astrometric microlensing events offer unique opportunities to 
measure the mass of isolated objects (e.g., Miralda-Escude 1996; 
Kains et al. 2017; Rybicki et al. 2018; Dong et al. 2019; Sahu et al. 
2022; Lam et al. 2022; Kaczmarek et al. 2022). When a foreground 
lens with mass My aligns closely enough with a more distant back- 
ground source, the gravitational field of the lens deflects the light 
of the background source forming a major and minor image of the 
source. The major image is formed on the same side of the lens as the 
source and the minor image on the opposite side of the lens. The two 
images, lens, and unlensed source position always lie along the same 
line (see e.g., Bramich 2018, for a review). As the lens intervenes 
between the source and observer, the images change position causing 
an apparent excursion of the source position (astrometric microlens- 
ing; Hgg et al. 1995; Miyamoto & Yoshii 1995; Walker 1995). For 
the astrometric microlensing event considered in this work, we are 
in a regime where the major image is resolved from the lens for 
the duration of the event. In this case the astrometric shift due to 
microlensing from the unlensed source position is (e.g., Sahu et al. 
2017; Bramich 2018), 


64(u) = ; [vu +4-ul Opa (1) 


Here, Og = \4Ge2M. (pF) - De) is the angular Einstein radius 
of the lensing system (Chwolson 1924; Einstein 1936), and G, c, Dr, 
and Ds are the gravitational constant, the speed of light, the distance 
to the lens and the distance to the source, respectively. u is the angular 
lens-source separation vector pointing towards the source position in 
units of Og or u = (Ws — M7) /Og = B/Og, where 7 5 denotes 
the angular position of the lens and source, respectively. u = |u| and 


a denotes the unit vector. Crucially, if 64 can be measured during 
the event and Dy, and Dg are known, then My can be inferred. 

For an astrometric microlensing event to be monitored it first must 
be found. Refsdal (1964) first noted that if the positions and proper 
motions of celestial objects were known with sufficient accuracy, 
then close alignments between them, and hence microlensing events, 
could be predicted ahead of time. This is in contrast to the currently 
dominant channel of finding photometric microlensing events, in 
which hundreds of millions of stars are monitored in the Galactic 
bulge and plane to catch event as they unfold (e.g., Mr6z et al. 2019; 
Kim et al. 2016; Husseiniova et al. 2021). Following the suggestion of 
Refsdal (1964) and Paczynski (1995, 1996) many attempts to predict 
microlensing events followed (Feibelman 1966, 1986; Sahu et al. 
1998; Salim & Gould 2000; Proft et al. 2011; Sahu et al. 2014; Proft 
2016; Lépine & DiStefano 2012; Harding et al. 2018). Follow-up of 
these predictions has proven difficult because of imprecise astrometry 
which lead to low confidence event predictions. Interest in predicting 
events was reignited with the advent of astrometry from the Gaia 
satellite (Gaia Collaboration et al. 2016a), orders of magnitude more 
precise and numerous than any of its predecessors. First, using Gaia 
Data Release 1 (Gaia Collaboration et al. 2016b), McGill et al. (2018) 
made one high confidence prediction of an astrometric microlensing 
event by a nearby white dwarf. Then, following Gaia Data Release 2 
(Gaia Collaboration et al. 2018) many independent studies searched 
for both astrometric and photometric events resulting in ~ 5000 
predictions (Kliiter et al. 2018a,b; Bramich 2018; Bramich & Nielsen 
2018; Mustill et al. 2018; Nielsen & Bramich 2018a; Ofek 2018; 
McGill et al. 2019a,b, 2020). Most recently Kliiter et al. (2021) 
searched for predicted microlensing events using Gaia Early Data 
Release 3 (GEDR3; Gaia Collaboration et al. 2021) and found 1758 
new events. 

Only two predicted astrometric microlensing events have been 
successfully followed before this paper. Using the Hubble Space 
Telescope (HST), Sahu et al. (2017) successfully detected the astro- 
metric signal of the microlensing event originally predicted by Proft 
et al. (2011) involving the nearby white dwarf Stein 2051B. The 
astrometric microlensing signal permitted Sahu et al. (2017) to mea- 
sure a gravitational mass for Stein 2051 B of 0.675 + 0.051 Mo. This 
marked the first ever detection of the astrometric microlensing effect 
outside the solar system and provided a direct test of white-dwarf 
evolutionary theory. Next, Zurlo et al. (2018) detected an astromet- 
ric microlensing event using the Very Large Telescope. This event 
was caused by our nearest stellar neighbour, Proxima Centauri, and 
was originally predicted by Sahu et al. (2014). Using the astrometric 
signal, Zurlo et al. (2018) determined the mass of Proxima Centauri 
to be 0. 1S0 Mo. This marked the first direct gravitational mass 
measurement of Proxima Centauri, and provided the only current 
opportunity for a direct mass determination. 

In this paper we present analysis of the follow-up of the astromet- 
ric microlensing event caused by the nearby DQ-type white dwarf 
LAWD 37 (LP 145-141), originally predicted by McGill et al. (2018). 
This event peaked in November 2019, with a predicted astrometric 
shift of 64 ~ 2.8 mas. First, we describe the two sources of data used 
in this analysis which are the multi-epoch astrometry from HST, and 
the astrometric solutions for the source and lens from GEDR3. Next, 
we detail the combination of these sources of data within a Bayesian 
framework to infer the mass of LAWD 37. Then we describe the con- 
cept of leave-one-out cross validation and use it to compare different 
noise models of the data. We then use the inferred gravitational mass 
from the astrometric microlensing event as a semi-empirical test of 
the white dwarf MRR. Finally, we explore the implications of this 
work on the follow-up of future predicted microlensing events. 
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Epoch Observation Filter Subarray Exposure Number of 
date size (pixels) time (s) exposures 

1 1 May 2019 F814W = .2048x2048 95 PI 

2 18 Sep 2019 F814W 2048x2048 = 82 5 

3 25 Oct 2019 F814W 2048x2048 = 89 6 

4 10 Nov 2019 F814W = 2048x2048 = 95 5 

5 26 Nov 2019 F814W_~ 1024x1024 = 85 13 

6 5 Dec 2019 F814W = 1024x1024 = 85 13 

7 3 Jan 2020 F814W 1024x1024 = 85 13 

8 5 May 2020 F814W_~ 1024x1024 = 85 13 

9 16 Sep 2020 F814W ~ 1024x1024 = 85 14 


Table 1. Summary of the HST WFC3/UVIS observations used to constrain 
the position of the source. The subarray is the pixel subarray of the UVIS2 
detector which were chosen to minimize Charge Transfer Efficiency (CTE) 
effects. The data are from HST programs GO- 16251, 15961, and 15705 (PI 
Kailash C. Sahu). 


2 DATA 


There are two sources of data used in this study. First, we have multi- 
epoch HST observations of the lensed source position before, during 
the predicted maximum of the event, and after. And, second, we have 
the astrometric solution of the source and lens from GEDR3. These 
astrometric solutions provide information on the unlensed source 
and lens trajectories. The combination of these two sources of data 
allows the astrometric shift due to microlensing to be measured, and 
consequently, LAWD 37’s mass to be determined. 


2.1 HST astrometric measurements 


We have nine successful epochs of WFC3/UVIS HST observations 
spanning over one year. A summary of these observation used to de- 
termine the position of the source! from HST programs GO- 16251, 
15961, and 15705 (PI Kailash C. Sahu) is given in Table 1. All HST 
epochs were taken with moderate-sized dithers (+100 pixels) among 
the pointings. The dither throw was small enough to allow all expo- 
sures to the include about 20 stars that could serve as astrometric 
references, but it was large enough to provide some control on Charge 
Transfer Efficiency (CTE) and distortion residuals. 

Figure 2 shows the path of LAWD 37 relative to the source star. In 
epoch 7, the image of the source is largely occulted by LAWD 37’s 
bleed column. In epochs 3 through 6, the bump-like features of the 
PSF introduce considerable complications into our measurement of 
the source’s location. For this reason, we have developed a model of 
the extended LAWD 37 PSF in the F814W images. 

We combined together all the F814W images of LAWD 37 from 
each of the 9 epochs and constructed a 4x-supersampled version of 
the LAWD 37 image for that epoch. The top row in Fig. 3 shows the 
9 stacked images (in the raw-pixel frame, so that the PSF features 
will all have the same orientation). When we subtracted the average 
of these nine stacks from each of the nine, we noted that most of the 
subtractions were quite clean, but the fourth epoch has considerably 


! Additionally, a series of WFC3/UVIS F814W short exposures (1.5s) were 
taken at each epoch in an attempt to constrain the position of the lens which 
was saturated in the long exposures. Unfortunately there were too few un- 
saturated, high-signal-to-noise stars common to both exposures needed to 
constrain the HST focus drift between the two exposure pointings. Therefore, 
we did not use any HST data to constrain the lens position and instead relied 
on the projected GEDR3 position of the lens which turned out to be more 
than adequate for the analysis (see Sections 2.2 and 5.3). 
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Figure 2. HST F814W-band image cutouts for each epoch of data during the 
LAWD 37 event. The source star is marked with a blue circle. LAWD 37 
is the saturated source moving from right to left. The images were made by 
stacking all images in a given epoch. Dates of each epoch are indicated in 
Julian years. 


larger subtraction residuals than the others, presumably due to the 
telescope experiencing an unusual focus level due to breathing (An- 
derson 2018). Unfortunately, this fourth epoch also has the source 
very close to the lens, which means that the position we measure for 
the source will be impacted by considerable PSF-subtraction errors. 
For this reason, we rejected the fourth epoch from further analysis. 
The source can be seen close to LAWD 37 in the third through 
the sixth epochs. We used the average PSFs from the eight good- 
focus epochs (with the source and its vicinity masked out in the 
relevant images) to construct an average F814W PSF for LAWD 37. 
Next, this average PSF was subtracted from the LAWD 37 image in 
each of the individual _flc images, which were corrected for CTE 
using the pixel-based correction (Anderson 2021). We then fitted 
the source and the other reasonably bright stars (unsaturated and a 
signal-to-noise ratio = 50) in each of these images using the F814W 
"library" PSF described in Sabbi & Bellini (2013). The PSF fits were 
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done by a chi-squared-minimization fit of the PSF model* to the 
star’s inner 3x3 pixels, after subtraction of a modal sky taken from 
an annulus with radii 8 to 12 pixels (Anderson 2016). This small 
fitting aperture was used in an effort to minimize any influence of 
the LAWD-PSF-subtraction residuals on the fit. The fitting solved 
for three parameters: a flux and a (x,y) position for each star in the 
raw frames. These raw positions were then corrected for distortion 
(Bellini et al. 2011). 

The _flc images were corrected for CTE using the most recent 
pixel-based correction (v2.0, which is currently in the WFC3/UVIS 
pipeline). Even with this correction, there are small residual CTE- 
related trends present. Since the various epochs were taken at different 
telescope orientations, we can inter-compare the measured positions 
to examine any residual trends. We transformed all exposures from 
all epochs into a common reference frame using all the unsaturated 
stars, apart from the source, brighter than a signal-to-noise of ~ 
100. We then fitted each star with an average position and proper 
motion. We then transformed these modeled positions back into the 
individual exposures to examine position residuals as a function of 
raw y position and instrumental magnitude (m = —2.5 x logioFe, 
where F- is the flux in electrons). This allowed us to determine 
a simple table T of relative corrections that removed the residual 
CTE trends as a function of instrumental magnitude. This correction 
had the form T [™m] (yaw /2048), and its typical amplitude was ~0.01 
pixel. 

The above analysis allowed us to construct generalized corrections 
for astrometry in the individual exposures. In order to measure the 
best possible reference-frame position for the source star over time, 
we based the final transformations on stars within +1 F814W mag- 
nitude of the source star’s brightness and within 625 pixels (25’’) 
of the source star’s position. There were 22 such stars. We used the 
Gaia catalog to predict a reference-frame location for each of these 
stars at the epoch of each exposure. We removed one star that had a 
predicted parallax greater than 1 mas, just to be sure we are dealing 
with distant stars. 

Using the Gaia-predicted positions in the reference frame and the 
observed, distortion and CTE-corrected positions of the stars, we 
solved for a 6-parameter linear transformation from each observed 
image frame into the reference frame. We then examined the resid- 
uals of this transformation and tweaked the positions and the proper 
motions of the individual stars to improve the fits. Note that HST as- 
trometric measurements are more precise than Gaia measurements, 
particularly for stars at V~18 or fainter, so it makes sense that the 
reference-frame positions can be improved via this iteration, while 
maintaining the average absolute properties of the reference frame 
(zeropoint, scale, bulk motion, etc) that only Gaia can provide. At 
this stage, one star was found to have larger than expected residuals, 
likely due to an unresolved companion. This star was rejected, which 
left us with 20 stars to use as the basis for the transformations into 
the reference frame. These reference stars have a nominal position 
precision of 0.02 pixel (0.8 mas) in each exposure, due to shot-noise, 
read-noise, and small errors in the PSF model, all of which are ran- 
dom sources of noise (see Fig. 2 in Bellini et al. 2014). Thus, we 
have a distortion- and CTE-corrected position for each of 21 stars 
(the source plus 20 good reference stars) in each of 87 exposure 
frames. These positions are then used in the analysis that follows to 
examine how the source position changes over time. 

Since the PSF is known to vary with time, primarily as a conse- 


2 https: //www.stsci.edu/hst/instrumentation/wfc3/ 
data-analysis/psf 
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Figure 3. Top row: Cutout images of the inner 31x31 pixels of LAWD 37 for each of the epochs (Epochs | to 9 are left to right). The "bumps" in PSF in the 
annular region between a radius of 7 and 10 pixels (marked in green on the plot) are clear. Bottom row: The same images, but with the average PSF subtracted 
(Epoch 4 was excluded from the average). The source can be seen close to LAWD 37 in the third, fourth, fifth, and sixth epochs. The large vertical black region 
at the center of each of these images corresponds to the saturated pixels at the center of each deep exposure of LAWD 37. 
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Figure 4. Schematic of how the source position data are believed to be 
generated within an epoch. The data receive a correlated error from lens PSF 
subtraction, which is the same for all data within an epoch (red). The data are 
also scattered with white noise (blue). 


quence of HST focus drift, there is a variation in the uncertainty in the 
quality of the PSF for each epoch. Critically, this results in a corre- 
lated error in the target source position within an epoch. Fig. 4 shows 
an example realization of noise believed to corrupt the astrometric 
measurements of the target source within an epoch. The lens PSF 
subtraction introduces a correlated, within-epoch scatter in addition 
to the white instrument noise scatter. In order to estimate the size of 
this correlated error for each epoch, we repeat the PSF subtraction 
for each epoch with the PSF obtained for each of the other epochs 
in succession. We use the distribution of residuals as a proxy for the 
size of the within-epoch correlated error in the target position. The 
mean of the distribution of the residuals for each epoch (m7, _.o,,) are 
shown in Table 2. 

In the analysis that follows, all astrometric data are on the tan- 
gent plane projected at reference position right ascension Qyef = 
176.46045340°, and declination 6;ef = —64.84488414°, on the 
ICRS. Coordinate (Qyef, Opes) corresponds to (1000, 1000) pixels in 
the reference-frame tangent plane with the first coordinate, X, having 
the direction of the local west unit vector and, the second coordinate, 
Y, having the direction of the local north unit vector. Units in both of 


Epoch 1 2 3 5 6 i 8 9 


0.04 0.12 0.6 0.6 0.4 0.6 0.04 0.04 


m Ge,corr mas 


Table 2. Estimated sizes of the correlated noise standard deviation due to 
the PSF subtraction in each epoch of data. These values were obtained via 
simulated PSF subtraction. Note that the size of the correlated noise tends to 
increase when the lens and source are closest. Epoch 4 is omitted due to the 
reasons outlined in Section 2.1. 


the coordinates are then scaled to be 1 mas (1 pixel=40 mas). In what 
follows we denote a position in this tangent plane as, = [X,Y]. 


2.2 Gaia astrometric solution 


GEDR3 provides reference positions, proper motions, and parallax 
values for both the source and lens. Additionally, each parameter’s 
standard errors and correlations, which are derived from a linear least 
squares fit of single-epoch astrometric measurements, are provided 
(Lindegren et al. 2021). For GEDR3, the astrometric solutions are 
based on measurements taken between July 2014 and May 2017°. 

The mean and covariance matrix for the source (GEDR3 source ID 
5332606350796955904) astrometric parameters in the tangent plane 
projection defined in Section 2.1 are, 


35186.9625 
45709.7257 
mS =| 8.89044 |, (2) 
0.03676 
0.19940 


0.2 -0.01 0.04 -0.06 -0.04 

-0.01 02 -0.05 0.003 0.05 
xg=10'| 0.04 -0.05 0.3 -0.02 -0.02]. (3) 

-0.06 0.003 -0.02 0.3 0.07 

-0.04 0.05 -0.02 0.07 03 


Here, the mean and covariance matrices have the parameter or- 
der, X0,5, Yo,s source reference positions (mas), x ,5, Hy ,s proper 
motions in the X and Y directions (mas/year), and zs paral- 
lax amplitude (mas). Similarly, for the lens (GEDR3 source ID 


3 https: //www.cosmos.esa.int/web/gaia/earlydr3 
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5332606522595645952), 


45825.73799 
46592.48286 
= |—2661.63959| , (4) 
—344.93250 
215.67527 


0.2  -0.03 -0.02 -0.04 -0.03 
-0.03 0.2 -0.03 0.01 0.07 
xS=1073|-0.02 -0.03 03 -0.05 0.02]. (5) 
0.04 0.01 -0.05 0.4 0.06 
-0.03 0.07 0.02 0.06 0.3 


a) 


This astrometric information from GEDR3 for both the source and 
lens will be used in Gaussian priors for the models described in Sec- 
tion 3. Fig. 5 shows the GEDR3 predicted unlensed source trajectory, 
and the HST astrometric measurements of the source. We can see that 
there are clear offsets from the GEDR3 unlensed source trajectory in 
the expected direction of predicted lensing signal, although the data 
are clearly noisy. 


3 MODELS 


Fig. 5 shows that we are in a low signal-to-noise regime, as the offset 
in the data relative to the projected GEDR3 unlensed position of the 
source is comparable to the size of the scatter at each epoch. It is 
therefore important that we investigate a range of models. In this 
section we describe the different models that were fitted to the data, 
and compared to one another. We fit models with and without the 
astrometric lensing signal and with and without a correlated noise 
component. We first address the choice of parameterisation of the 
astrometric lensing signal. Next, we outline the different likelihoods 
used in each of the models. Finally, we detail the prior distributions 
used in each model and the methods used to sample the posterior 
distributions in all models. 


3.1 Parameterisation of the microlensing signal 


Using the expression for 6, (Eq. 1), and calculating £, using the lens 
and source trajectories, the lensed source position can be seen to be 
dependent on 11 parameters, 


[Xo,s-Yo,s.Hx,s>Hy,s>%s»Xo0,7+Y0,1>UX,L»Hy,L» AL, Mz] . (6) 


Specifically, the source (ae = [Xo,s. Yo,s. Ux,s»Hy,s,as]) and 
lens (os = [Xo,L. Y0,L> MX.L»HY,L»7L]) astrometric parameters 
and the lens mass, M,. However, modeling the signal with these 
parameters is not straightforward. This is due to the fact that parallax 
enters the model in two different ways. Firstly, the source and lens 
parallaxes control the lens and source unlensed trajectories and hence 
the lens-source angular separation. Secondly, the lens and source 
parallaxes enter as distance terms and control the size of Og. The 
problem arises due to the interpretation of negative source parallax 
values when trying to include the GEDR3 astrometric solution as 
priors in the model. 

The GEDR3 reported value of the source parallax with standard 
error is 0.20 + 0.16 mas, where some of the distribution ms < 0 
(assuming a Gaussian distribution). Using negative parallax values 
when zg enters the model in the source trajectory is fine as this 
reflects uncertainty in the parallax component of source trajectory 
and is physical. However, a negative m5 value entering the model 
as a distance term and re-scaling Og, is not physical. In this case, a 
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negative m5 value would act to artificially increase @g and therefore 
potentially bias the inference towards lower M,. 

There are a number of ways to mitigate this problem. A simple 
solution is that Og could be fitted for instead of Mz. In this case the 
astrometric parameters of the model would be [oe ae Og]. Here 
the prior on zg only enters the model as a trajectory term and hence 
negative values are permitted. Finally, My can then be extracted after 
inferring a value for Op. 


3.2 Likelihoods 


In order to model the data, we need to setup a likelihood function that 
encodes our beliefs as to how the data were generated. Specifically, 
for all models, we assume the process that generated the data at time 
t;, for the source is of the form, 


61530, Mn) = T(t 0%) + €N (15,09). (7) 


Here, £ is the observed source position in the tangent plane given 
by model My y with trajectory component T and noise component 
N. £7 is the uncorrupted source position predicted by trajectory 
component T with astrometric parameters 6t, €% is an additive 
zero-mean Gaussian noise component N with parameters 6°, 
We consider two different trajectory models, with and without the 
astrometric lensing deflection term. The model without the deflection 
is, £7 (t;; 0°") = 2N(¢;, 0) = C(t), 0%"). Here, 


ig) = Ee +asJ'Ro(ti) (8) 


Yo,s 


xX 
ale (tj _ href) b 
HYy,s 


Re(t) are Cartesian Barycentric solar system coordinates in au of 
the Earth at time t. R@(t) was retrieved via the Astropy Python 
package (Astropy Collaboration et al. 2013, 2018) which uses values 
computed from NASA JPL’s Horizons Ephemeris*. J~| is the inverse 
Jacobian matrix of the transformation from Cartesian to spherical co- 
ordinates (e.g., Urban & Seidelmann 2014, Section 7.2.2.3) projected 
at the reference position (ef, Oref), aNd freg = J2016.0 is the GEDR3 
reference epoch. Eq. (8) is just the standard motion of the source 
with proper motion and parallax. The model with the deflection 
is £7(153 0%) = gP(1;, 0) = € (15; 08%) + 64(t13 (63%, 63, Op]) 
where 62° = [es 07, Og]. 

For all models, we assume that the noise is uncorrelated between 
the X and Y directions. With this in mind, we can write the likelihood 
for a set of Ke data points within an epoch, e. Let the data in epoch e be 
denoted as De = {te, Xe, Ye}, where the elements of D¢ are vectors 
of length Ke, and represent the times, X positions and Y positions 
of all data points within epoch e, respectively. The likelihood of data 
within an epoch e is then, 


P(Del8e, Mr) =N (XelX2 (tes 6), EN (92°%*)) 
| 9) 
XN (Yel¥! (te; 6%), EY (ani) 


Here N is the Ke dimensional multivariate Gaussian density, 
Oe = [ast gnoise) X! (te; 0°) and Y! (te; 6°) are vectors of 
X and Y source positions of length Ke obtained by evaluating the 
vector of times f for the trajectory model component T. LY (gnoise) 
is a Ke X Ke covariance matrix. We consider two noise models 
for the covariance matrix. An uncorrelated white noise model with 
ZY (gnoise) = EW (gnoise) =o? I, where I is the Ke x Ke unit 


diagonal identity matrix, and @2°© = [cwnite]. We also consider a 


4 nttps://ssd. jpl.nasa.gov/?horizons 
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Figure 5. HST astrometric follow-up data during the predicted microlensing event by LAWD 37. Left: Single HST astrometric measurements coloured by time 
are shown as circles. Measurements are clustered together in time within eight epochs of data. Also shown are 100 random samples of the source unlensed 
projected trajectory from the GEDR3 astrometric solution. Specifically, projected trajectories corresponding to samples ay" ~N (mg i xg), are shown in grey 


and the black dashed line corresponds to oe = mg. Middle and right: Projections of the source data in the X and Y directions, respectively. Small arrows 


indicate the predicted direction of the astrometric deflection signal. 


correlated and white noise model with ZN (92) = HC (gnolse) = 
mW ({Owhitel) +02 corrl where 1 is the Ke x Ke full ones matrix, and 
gnvise = [whites Te,corr]. The correlated noise model corresponds 
to the data generation process shown in Fig. 4. 

For all models, under the assumptions of independence between 
Ne epochs of data, the likelihood over the full data set is, 


Ne 
p(DIO,Mrn) =| | p(Dele,Mrw). (10) 


e=1 


Here, @ are all the model parameters and D = Os, is the 
full data set over all epochs. The consideration of two deflection 
model components and two noise model components leads to four 
distinct models to be investigated: a model with the deflection and 
correlation noise - Mpc; a model with the deflection and just white 
noise - Mpw; a model without the deflection and correlated noise 
- Myc, and a model without the deflection and just white noise - 
My w. Table 3 contains a summary of the different models and their 
components. 


3.3 Priors 


For the source and lens astrometric parameters (10 parameters to- 
tal), there is prior information from GEDR3. Specifically we as- 
sume multivariate normal distributions, p(Os) =N (m§, Bo), and 
poe) = N(mS, pe) using the values in Eqs. (3) and (5). 

There are three potential issues that need to be considered when 
using the GEDR3 astrometric solution as priors on the source and 
lens unlensed trajectory. The first two issues arise due to the im- 
plicit assumption that the GEDR3 astrometric solution for both the 
source and lens does not already contain some of the astrometric 
microlensing signal. This is a possibility as astrometric microlensing 
events typically have long tails (Dominik & Sahu 2000; Belokurov 
& Evans 2002), and could overlap with the data used to build the 


Model 

Mpc Mpw Mune Mnw 
Deflection v v x x 
White noise v v v v 
Correlated v x v x 
noise 
#ofparameters 20 12 14 6 
parameters 0 os, oe, oe, as", oe, 

om, Oz, O£,C white Owhite> Owhite 

Owhite> O corr 

O corr 


Table 3. Summary of the components of the four considered models. Deflec- 
tion indicates if the model contains the astrometric microlensing deflection 
term. Oe = [C1 corr, F2,corrs ++» ONe,corr] is the vector of correlated noise 
parameters. 


lens and source GEDR3 astrometric solutions. If the lens or source 
astrometric solution does contain a detectable part of the astrometric 
microlensing signal, this could potentially bias our inference as the 
lens and source astrometric parameters would not be representative 
of the true unlensed lens and source trajectories. 

Firstly, for the source, we have to check that when the GEDR3 
data were taken, there was not a significant lensing signal present. As 
LAWD 37 is so bright Gaia downloads cut-outs and carries out gating 
so when the background object is within 500 mas it is downloaded 
as part of the cut out but with less exposure time, otherwise, standard 
one dimensional processing is used (Fabricius et al. 2016). GEDR3 
is based on data collected between July 2014 and May 2017. In May 
2017, the predicted deflection of the source is < 0.2 mas and the lens 
source separation is >> 500mas. This is below the along scan (AL) 
precision fora G ~ 18 mag source (aay ¥ 0.8 mas with the standard 
Gaia pipeline; Rybicki et al. 2018; Bramich 2018; Everall et al. 2021). 
We therefore conclude that the astrometric lensing signal was not 
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Parameter Prior Description 

oe N (mg, =) GEDR3 prior for the source trajec- 
tory 

oe N (mS : XG) GEDR3 prior for the lens trajectory 

‘Twhite N(Mownite> a2) White component of the noise 
Mownite = 9-8, On = 0-1 

CO corr N(meorts o21 ) Correlated components of the noise 
for each epoch 

Or: U (20, 60) Uninformative prior of the angular 


Einstein radius 


Table 4. Summary of the parameter priors used in the models. 


detectable during the time GEDR3 data were collected and therefore 
did not significantly influence the GEDR3 astrometric solution of 
the source. Secondly, for the lens, we have to check if the shift due 
to the blending with the minor image was detectable during GEDR3 
(see Eq. (15) in Bramich 2018). In May 2017 this shift was ~ 10713 
mas, we therefore safely conclude that the lens GEDR3 astrometric 
solution does not contain a significant astrometric lensing signal. 

Finally, we have to consider if the GEDR3 astrometric solution of 
the G ~ 18 mag source has likely been influenced by the presence 
of the comparatively bright G ~ 11 mag lens. The current Gaia 
processing pipeline is able to resolve sources for separations in the 
most optimal cases down to ~ 200 mas (Arenou et al. 2018). While 
the lens-source contrast ratio is far from optimal in our case, in May 
2017, the lens and source had a predicted separation of ~ 6400 mas. 
At this separation, the lens and source were unlikely to be close to 
each other on the Gaia focal plane during the GEDR3 time baseline. 
Therefore, it is safe to conclude that the source GEDR3 astrometric 
solution is unlikely to have been significantly affected by the presence 
of the lens. Overall we conclude, for both the lens and source, the 
GEDR3 solutions are safe to use as priors on the unlensed source and 
lens trajectories in the models. Finally, it is worth mentioning that 
the astrometric solution priors from GEDR3 will affect the precision 
at which we can measure the mass of the lens, we defer discussion 
of this point to Section 5.3. 

For the white noise parameter present in all models, we set 
P(Owhite) = N (FipitelewnaOe) where Mo yy = 0-8 mas and 
On = 0.1 mas, reflecting the estimated instrument precision of 
WFC3/UVIS (Bellini et al. 2014). For the correlated noise param- 
eters Ocorr = [C1 corr, 72,corrs «++» TNe,corr] we assume a Gaussian 
prior p(O corr) = N(GcorrlMcorr, ont ), truncated at zero to avoid 
negative values. Here, Meorr = [Mo cos", corr?" Mon, corr | 
are the estimated size of the correlated noise components in Ta- 
ble 2 and we have assumed no correlation between epochs. Fi- 
nally, we then assume an uninformative uniform prior on Og, as 
p(Og) = U(Og|lower = 20 mas, upper = 60 mas). In all models, 
we build the full prior by taking the product over all the priors of 
the required parameters. Table 4 contains a summary of all param- 
eter priors, and Fig. 6 shows the probabilistic graph illustrating all 
parameter dependencies and structure of the models. 


3.4 Sampling the posterior 


Now that we have constructed the prior and likelihoods for each 
model, we may compute the posterior distribution on the model 
parameters. The posterior distribution or the probability distribution 
of the model parameters given the data, is given by Bayes rule, 


p(D|d, M)p(6|M) 
p(D|M) 


p(9|D, M) = x p(DIA,M)p(a|M). 1) 
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Figure 6. Probabilistic graphical model showing the dependence structure 
of the models considered in this work. An arrow from one node to another 
indicates a conditional dependence. No arrow between two nodes means they 
are conditionally independent. Unfilled circles are latent random variables 
in the models or parameters that are fitted for. Filled small circles are fixed 
values in the model (parameters for the informative prior distributions). The 
shaded circles are the observed data. Parameters inside a plate are repeated 
for each epoch and then direction. Parameters outside the plates are global 
parameters. Black parts of the graph are common to all models considered. 
Purple parts are common to models with an astrometric deflection. Pale green 
parts of the graph are common to models with correlated noise. 


We obtain samples from the posterior distribution using an MCMC 
algorithm. Specifically, we use the No-U-Turn Sampler (NUTS) 
Hamiltonian Monte Carlo algorithm (Homan & Gelman 2014) im- 
plemented by the PyMC3 Python package (Salvatier et al. 2016). 
NUTS allows samples from the posterior distribution to be obtained 
faster than other traditional MCMC samplers (e.g., Foreman-Mackey 
et al. 2013) because it uses first-order gradient information to effi- 
ciently step through the parameter space. We also take advantage of 
using the dense full mass matrix step implemented in the Exoplanet 
Python package for further performance gains (Foreman-Mackey 
et al. 2021). 

For each model investigated in this work, we run NUTS for 2000 
tuning steps and then for a further 10, 000 steps. This is done for two 
independent chains to permit between-chain convergence checks. 
Specifically, we compute the rank-normalized R convergence diag- 
nostic for each inference in this analysis. R measures convergence 
by comparing between-chain and within-chain variance for each pa- 
rameter. A value R > 1.01 indicates poor convergence (Vehtari 
et al. 2021). We find for all parameters in all inferences considered, 
R = 1.0, meaning good convergence. Running both chains for a 
model typically took 10 Central Processing Unit (CPU) minutes. 


4 MODEL COMPARISON AND CRITICISM 
4.1 Leave-one-out cross-validation 


We are fitting four models to the datas Mpc - a model with the 
astrometric microlensing deflection and correlated noise, Mpw - 
a model with the astrometric microlensing deflection and just white 
noise, Myc - a model with no astrometric microlensing deflection 
but with correlated noise, and finally, My w - a model with no 
astrometric microlensing deflection and just white noise. We then 
need to assess which model best explains the data and critically 
examine the strengths and weakness of each of the models. To do this, 
we use the Bayesian Leave-One-Out cross-validation score (LOO). 
LOO is one method to estimate point-wise out-of-sample prediction 
accuracy of a given model (Vehtari et al. 2017). LOO is calculated 
for a given model by fitting the model to the data set where one of the 
data points has been left out. The posterior samples of that fit are then 
projected through the model likelihood to assess how well the left-out 
data point is predicted by the model. The procedure is then repeated 
so each data point is left out in turn. For a given model, this provides 
a per data point score which can be totalled over the data to give 
an indication of overall model performance, or compared data point- 
wise with a different model allowing an interpretable comparison 
between models. 

Specifically, for a model M, which being fit to the full data set D, 
the LOO score for the ith data point, Dj = {t;, Xj, Y;}, is 


LOO;, y = log p(Di|D-i, M). (12) 


Here D_,; is the full data set D with the ith data point or D; removed. 
This is the log of the LOO predictive density conditioned on the data 
set without the ith data point. It can be written in terms of expected 
value of the likelihood of the left-out data point over the posterior 
distribution obtained while fitting the model to D_;, 


PORE fi p(Dil8.D-i,M)p(6|D-i,M)d0. (13) 


Practically, if we have S samples from the posterior distribution of 
the model parameters {0° ‘oa obtained by fitting model M to the 
data set D_;, we can use these samples to compute Eq. (13) as, 


S 
1 Ss 
p(D;|D_i, M) = § DL PDID-1 02M) (14) 


Here, we have assumed that we have a sufficient number of samples 
to fully capture the posterior distribution (Gelman et al. 2014). 

This quantity can be totaled over the data to give an overall score 
for model M, 


N 
LOO 4 = )) LOO; m- (15) 


i=l 
Alternatively the difference can be used to compare the performance 
of two model M, and Mp, 
ALOO y,,,mM, = LOO py, - LOOp,. (16) 


Here, a positive value indicates that Mj, has a higher computed out- 
of-sample predictive accuracy than Mp). The standard error on this 
difference is given by Vehtari et al. (2017) as, 


se (LOO 4, My) = YNVN, (LOO;, m4, - LOO, j,)- (17) 


ve is the variance of the point-wise difference over the full data 
set of N data points. This can be used to assess how significant 
the difference between two models is. For brevity, we define the 
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significance of the difference as sig(e) = | e |/se(e) where e = 
LOO mM,,M): 

In the case of this analysis, it is also informative to sum over 
the point-wise LOO score over the data in a single epoch, e. This is 
because the data are tightly temporally clustered within an epoch, and 
in the correlated noise models (Mpc, Myc), data within an epoch 
share correlated noise properties. We define the difference in LOO 
predictive accuracy over an epoch e, as ALOO‘;, Mo" This quantity 
is analogous to ALOO jy, 1, (defined in Eqs. 15 and 16), but instead 
of summing the point-wise score over all data points, the sum is taken 
only over the data in epoch, e. The standard error, se(ALOOY;, : Mo)? 
is also calculated analogously by instead calculating the variance of 
the point-wise difference of data within the epoch. In this case, N 
in Eq. 17 is the total number of data points within epoch e. Overall, 
computation of ALOOS Mp will permit the comparison of models 
at epoch resolution. 

Calculating all LOO, j4 terms for all models is computationally 
expensive. This is because it would require N full refits (obtaining 
samples from the posterior distribution) of the model with each data 
point left out in turn. In our case, N = 81 and a single refit of a 
model takes = 10 CPU minutes, therefore computing all LOO;, jy 
terms for the four considered models would take ~ 55 CPU hours of 
computation. While not completely unfeasible, the required compu- 
tation is still significant. We therefore turn to an importance sampling 
approximation to compute the LOO;, ,y terms for each model. 

We use the Pareto Smoothed Importance Sampling (PSIS; Vehtari 
et al. 2015) approximation to compute the LOO;, 4 terms for each 
model (Vehtari et al. 2017; Biirkner et al. 2020), implemented in the 
Arviz Python package (Kumar et al. 2019). Instead of refitting the 
model with each data point left out in turn, in the PSIS approximation 
the model is initially fit to the full data set. Then posterior samples 
from the full data set fit are re-weighted (via importance sampling) 
to approximate the effect of removing each data point in turn. Over- 
all PSIS allows the fast and approximate computation of the LOO 
terms for a model with very few refits. Appendix A contains the 
application details of the PSIS approximation along with checks of 
the approximation accuracy for the models considered in this work. 


4.2 The case for LOO over other comparison metrics 


LOO is just one of many metrics that can be computed to assess the 
performance of a model. In this section, we briefly justify the decision 
to use LOO as a model criticism and comparison tool compared to 
the two commonly used approaches in astronomy: a reduced Ay 
approach, and use of the Bayesian model evidence. 

Typically in microlensing event analyses, albeit in analyses of 
photometric microlensing events, a reduced Ay? approach is used 
to select between competing models (e.g., Bond et al. 2004; Smith 
et al. 2005; Alcock et al. 2000; Bennett et al. 2018). There is a 
multitude of reasons why we do not use it here and choose LOO 
instead. Firstly, because some of the models considered in this work 
contain Gaussian-correlated noise components, reduced A x? is no 
longer fully descriptive of the likelihood of the model (reduced A y? is 
only related to the likelihood of an uncorrelated Gaussian likelihood 
with a diagonal covariance matrix). Secondly, reduced A a0 is only 
valid for a model that is linear in its parameters; none of the models 
considered in this work are linear in all their parameters. Thirdly, 
reduced Ay? fails to account for any posterior uncertainty on the 
parameters. Andrae et al. (2010) gives an extensive account of the 
pitfalls of using reduced Ay? for comparison of non-linear models. 
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Parameter Unit Model 
Moc Mpw Mnc Mnw 

X0,s mas 35186.9979)}42,. 35187.0279) 43, 35186.9439)°5. 35186.9469.) 7% - 
Yo,s mas 45709.705°)145,, 45709.675°))74, 45709.7269.)23,, 45709.7419) 43, 
HX,S mas/year 8.9640.04 | get oe. 9.011.045 , 9,0420.045 , 
HY,S mas/year 0.02804" | =0.012° 98, 0.1599. 085 - 0.2899.043 

ms mas 0.123944, 0.04308, 0.127299, 0.070%) 1g 
Cente mas 0.838004, 0.993008, o.se222H6, 1.22500, 
X07 mas 45825.7380 011, 45825.738°0 5, - 2 

Yo. mas 46592483001). 46592.483005 - - 

Hx.L — masiyear -2661.640%)i,, ~2661.640%2 hs - 

HY,L mas/year —344.9320 001 -344.9329 000 5 = - 

AL mas 215.6759 018 -215.676008, - 

ve mas 31.3537, 34. 1641386) ; 

Picor —mas 0.081255 0.0880 

Crcon mas 0.127985 0.20007, + 

Tico mas 0.57 h 0.62795, 

eso mas O60, 0.74100) 

Toco mas 0.380% "By - oar, 

Crcon mas 0.6819) - 0.1990, 

Tene Soe 0.07809'053 - 0.120.078 : 

veo mas 0.0859 iy 0.103009, 


Table 5. Parameter posterior summaries for each model. Values are the posterior medians, uncertainties are the 84th-50th and 50th-16th posterior percentiles. 


’.” indicates that the considered model does not contain the parameter. 


Comparison of models using the Bayesian evidence (the denom- 
inator in Eq. 11) is becoming popular in astrophysics due to nested 
sampling algorithms that readily allow its computation (e.g., Skilling 
et al. 2006; Higson et al. 2019; Speagle 2020). The Bayesian evidence 
has the appealing properties of fully capturing parameter uncertainty 
and naturally penalizes more complex models that do not significantly 
explain the data better. The critical downside, however, is that the ev- 
idence is sensitive to the choice of prior distribution (see e.g., Fong 
& Holmes 2020). This becomes a problem when comparing models 
possessing parameters with uninformative and somewhat arbitrarily 
set prior distributions (see e.g., Section 7.2 of Gelman et al. 2013). 
For example, for @g in this analysis, the arbitrary choice of a large 
width for its uninformative prior can arbitrarily change the model 
evidence without any resulting change of the posterior distribution. 

Comparatively, LOO is not sensitive to the model priors. This 
is due to each LOO; jy term being computed when the model is 
conditioned on the rest of data (Eq. 13), so the prior is always 
overwhelmed by the likelihood and has little effect. Moreover, the 
Bayesian evidence only provides a single summary statistic for the 
whole model and data, shedding little light on precisely where a 
model fails, whereas LOO provides an interpretable per data point 
score. 


5 RESULTS 
5.1 Performance of the models 


All models (Mpc, Mpw, Mnc, Mnw) were fitted to the data. 
Posterior parameter summaries for all parameters can be found in 
Table 5, and posterior projections of the model over the data are 
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shown in Fig. 7. Additionally, LOO scores were computed for all 
models as described in Section 4.1. 


Table 6 shows the pairwise comparison between all model com- 
binations and the total LOO scores. Mpc (deflection and cor- 
related noise model) has the best overall LOO score when com- 
pared to all other models. My w is the least preferred model with 
all other models having a higher score. The model most com- 
petitive with Mpc, is Myc with ALOOM,6,Mnc = 8-6 and 
sig(ALOO Mine,Myc) = 1.5. This means that the correlated com- 
ponent of the noise is an important feature of the model, since 
the deflection model with just white noise (Mpw) is compara- 
bly not competitive with Myc. In fact, Table 6 shows that Myc 
is preferred over Mpw with ALOO\y6. Mpw = —20.9 and 
sig(ALOO Myc,Mpw) = 2-2. That is, the non-deflection corre- 
lated noise model is preferred over the model with the deflection and 
just white noise. While Mpc is the overall preferred model, it is 
informative to understand how exactly Myc is able to explain the 
data with no deflection term and even beat the deflection model with 
just white noise, Mpw. 


The starting point for understanding the good performance of 
Mvyc is to examine the per-epoch LOO scores. Fig. 8 shows the 
per-epoch LOO scores for Mpw and Mpc compared to Myc. 
For the comparison of the two correlated noise models (Mpc versus 
Myc), itis shown that Mpc marginally beats My c in every epoch 
with the exception of epoch 2 where Mpc is more clearly preferred 
and epoch 7 where Myc beats Mpc. The reason why Myc can 
explain the epoch with the largest deflection terms (epochs 5, 6, and 
7), is that it inflates the correlated noise and alters the unlensed source 
trajectory. 


Fig. 9 shows the priors and posterior distributions of the noise 
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Figure 7. Posterior realizations plotted over the data in the X and Y directions and for each of the considered models. Coloured bands show the 84th-16th 
posterior percentiles on the inferred source trajectory for each model and direction. Dark (light) grey bands are the posterior data realizations 1 6th-84th (2nd-98th) 
percentile bands. Specifically this includes the trajectory and the noise model component realizations. The posterior data realizations are discontinuous between 
epoch because the noise model is only defined within an epoch and on the data grid. 


parameter in all of the models. For the correlated noise parameters, 
Myc inflates the size of the noise parameters relative to Mpc and 
the prior for epochs 5,6 and 7. Myc does this to try and explain the 
deflection signal. For epoch 2, the correlated noise term in Myc 
is slightly inflated compared to Mpc and the prior. At first glance, 
this seems counterintuitive because the deflection at epoch 2 is small. 
This begs the question as to what signal Myc is trying to explain 
away with increased correlated noise. The reason for this is that, 
although the signal at epoch 2 is small, so is the estimated magnitude 


of the correlated noise, and so epoch 2 has one of the highest signal- 
to-noise ratios of all the epochs. Overall, this means correlated noise 
can mimic the deflection signal. It is noted that in epoch 7 and 
for the white noise both Myc and Mpc inflate the noise terms 
relative to the prior. This suggests that the priors underestimate both 
of these quantities. It is also noted that for models missing either the 
correlated noise or the deflection signal, the white noise is increased 
to compensate relative to Mpc and the prior. 


Fig. 10 shows the GEDR3 priors and posterior distributions of 
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ALOO;ow, column 

Mpc Mpw Mne Mnw 
Mpc - 29.5(3.5) 8.6(1.5) 76.5(4.6) 
Mpw ~29.5(3.5) - ~20.9(2.2)  47.1(3.4) 
Mne -8.6(1.5)  20.9(2.2) 7 67.9(5.4) 
Mnw -76.5(4.6) -47.1(3.4) -67.9(5.4)- 
LOO jy —215.6 -245.0 —224.1 -292.1 
se(LOO,y) 38.3 37.8 33.2 26.0 


Table 6. Pairwise difference in LOO scores for all models considered. 
Positive number means the model in the row is preferred. Significance, 
sig(ALOO;ow,column), Of the scores are indicated in with the parentheses. 
Total LOO score for each more and its standard error are shown in the bottom 
two rows. In order of descending LOO score (highest to lowest predictive 
accuracy) the models ae Mpc, Mnc, Mpw and Mnw. 


-154 @ X=Moc 

@ X=Mow 

1 2 3 4 5 6 7 8 9 
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Figure 8. LOO score of models Mpc and Mpw compared to Myc 
within each epoch. Error bars are one standard error. 


the source astrometric parameters. While there is broad agreement 
between all models for the source proper motion in the X direction 
and the source parallax, Myc and Mpc disagree on the Y direction 


proper motion posterior. Mpc infers py 5 = 0,028 mas/year, 
whereas Myc infers uy 5 = O50 ie mas/year. The relatively 


high value inferred by Myc is caused by Myc trying to explain 
away the positive Y direction deflection (see e.g., Fig. 7) by altering 
the source trajectory. This means that further data taken after the 
event to pin down py 5 could completely rule out Myc as a viable 
model. Encouragingly, we note that for all models the lens and source 
astrometric parameters are consistent with the GEDR3 priors (see 
Figs. B2 and BI in Appendix B). 

The reason for the better performance of Mpwy compared with 
Mwvyc in epoch 2 is a high signal-to-noise deflection (as mentioned 
earlier), combined with the inflexibility of the source trajectory to be 
altered to explain away an offset in the negative X direction. This is 
due to the asymmetrical deflection in the X direction being in the 
negative (before the closest approach) then positive (after the closest 
approach) X direction. The source trajectory cannot be altered in 
Mw c to account for both of these offsets, so Myc performs worse 
than Mpc in epoch 2. The better performance of Myc compared 
to Mpw in epoch 7 is due to the outlying data within that epoch 
(see Fig. 7). These outlying data are located at lower Y values which 
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are further away from the deflection trajectory of Mpc than the 
unlensed trajectory of Myc. 

For the per-epoch comparison of Myc and Mpw, Fig. 8 shows 
that Myc beats Mpywy in every epoch apart from epoch 2. The 
relative performance of Myc and Mpw can be explained by the 
same reasoning used above, for the Mpc and Myc epoch com- 
parison. In epoch 5, Fig. 8 shows Mpwy clearly performs worse 
than Myc, despite there being a large deflection signal in epoch 5. 
This is due to epoch 5 also having a large correlated noise estimate 
(M Ge corr = 9-6 mas) which white noise Mpw cannot explain away, 
even by inflating the size of the white noise (see Fig. 9). 


5.2 Inference on the angular Einstein radius 


Both models including the astrometric lensing deflection signal 
(Mpw and Mpc), provide a posterior inference on Og. Fig. 11 
shows the @g marginal posterior distribution for both the Mpw 
and Mpc models, along with the prior used in both models. For 
Mpw and Mpc, the inferred values are Og = 34.at4 mas and 


Og = 1a mas, respectively. Here the values and upper and 
lower error bars represent the 50th, 84th-50th, and 16th-50th pos- 
terior percentiles, respectively. Fig. 11 shows that Mpw _ provides 
a tighter constraint and slightly higher value of @g compared with 
Mpc. It is also shown that the Mpc Qk posterior distribution is 
slightly asymmetrical with more probability mass towards lower Og 
values. 

The difference in the Og posteriors between Mpw and Mpc is 
consistent with the findings in Section 4. In Section 4, it was shown 
that the correlated noise part of the model (with help from a pos- 
itive {ly 5) can mimic or explain away some of the microlensing 
signal. Therefore, jointly fitting both the correlated noise and the mi- 
crolensing signal means the observed offset on position explanation 
is shared amongst the microlensing signal and correlated noise, caus- 
ing a slightly lower posterior median Og (as 64 o Og for u > 1) for 
Mpc compared with Mpw. Conversely, because the white noise 
model in Mpwy cannot explain correlated noise at each epoch, it 
inflates the values of Og to compensate. The larger spread in the Og 
posterior for Mpc is likely similarly due to the fact the correlated 
noise can mimic the microlensing signal, mean that there is some 
degeneracy between them, which ultimately leads to a less certain 
inference on Og in Mpc. 

We now use the posterior samples to calculate an inferred value 
for My, the mass of LAWD 37. Inverting the expression for Og, we 
can write the My, in terms Og and zy, — zg. Of course, we run into 
the same difficulties described in Section 3.1 of negative a5 values 
entering as distance terms in Og. This is not a problem because the 
source is so distant, 75 ~ 0.1 mas, compared with the lens, 77, ~ 215 
mas, we can approximate my, — m5 ~ my. Under this approximation 
we get M, for the correlated noise model, My = 0.56 + 0.08Mo, 
and My; for the white noise model, My, = 0.6670. "6 Mo. Here, we 
report the 50th, 84th-16th, and 50th-16th posterior percentiles. We 
note inclusion of zs both with negative posterior samples and with 
the negative posterior values truncated, does not change the reported 
mass values or uncertainties for either model. 


5.3 Prior sensitivity 


We can check how sensitive the posterior inference on Og is to the 
prior assumptions in Mpc. Specifically, we can test how tightening 
or relaxing the prior parameter distributions affects the Og posterior 
distribution with the view of determining which of our assumptions 
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Figure 9. Prior and marginal posterior probability density functions for the noise parameters in the different models. Vertical dashed lines show the values 
of these parameters estimated from the PSF subtraction simulations (Table 2). The prior probability density function is shaded to aid differentiation with the 
posteriors. Note that the models with no correlated noise, Mpw and My w, do not have correlated noise parameters and therefore, only the posteriors on 


Owhite are shown. 


strongly affect our inferences. For both the lens and source unlensed 
trajectory, we used informative Gaussian priors from GEDR3. The 
right panel of Fig. 12 shows how changing (inflating or shrinking by 
a multiplicative factor) the GEDR3 prior covariance matrix on the 
source (9) or lens (9), changes the @g posterior constraint. Fig. 


12 shows that the inference on Og is insensitive to changing z, 
Specifically, the posterior constraint on @g does not degrade until 
=P is inflated by a factor of 1000 (or equivalently multiplying the 
standard deviation by a factor of 1000). Moreover, it is also shown 
that shrinking ze (even shrinking by a factor of 10007) does not 
improve the posterior constraint on Og. This means that further data 
on the lens position, either by future Gaia data releases or further 


HST monitoring, is unlikely to improve the constraint on Og. 


The right panel of Fig. 12 also shows the effect of changing the 
source unlensed trajectory prior covariance matrix ze, specifically 
that the outcome is somewhat more sensitive to the prior on the source 
unlensed trajectory compared with the lens trajectory. Fig. 12 shows 
that inflating the prior by a factor of 10? immediately starts to degrade 
the inference on Og, although the degradation does level off past this 
point. This is likely because at an inflation level beyond 1007, the 
source unlensed trajectory is completely determined by the HST data 
and the source unlensed trajectory prior becomes uninformative. 


More importantly however, shrinking the prior on the unlensed 
source trajectory does improve the posterior constraint on Og. Specif- 
ically, an improvement of a factor of 10? in the prior covariance of the 
source unlensed trajectory would lead to a maximum improvement 
in the 16th-84th posterior constraint on Og of ~ 1 mas (correspond- 
ing to a 7% improvement on 84th-16th percentile constraint on Mz). 


While an improvement of a factor of 10? of the source astrometry 
covariance matrix is unlikely to be achieved, this means that further 
astrometric data pinning down the unlensing source trajectory either 
by future Gaia data releases or HST will likely improve the poste- 
rior constraint on Og. Assuming a more realistic improvement of 
2? on the source astrometry covariance matrix, corresponds to a 3% 
improvement on 16th-84th percentile constraint on M,. 


The left panel of Fig. 12 shows the effect of inflating or shrinking 
the standard deviation on the Gaussian prior for the noise parame- 
ters (7), both correlated and white. The means of these priors were 
informed by lens PSF subtraction simulations and WFC3/UVIS in- 
strument precision, in the case of the correlated and white noise 
terms, respectively. The standard deviations of these priors, (a), 
were however chosen to be 0.1 mas. Fig. 12 shows that the analy- 
sis is sensitive to changing o;,. Specifically, if 0, is increased from 
0.1 mas to 0.4 mas, the Og 16th-84th posterior percentile degrades 
by ~ | mas. In contrast, shrinking oa, to 0.01 mas only marginally 
improves the @g¢ 16th-84th posterior percentile, by ~ 0.2 mas. This 
sensitivity is reflected in the posterior distributions of the correlated 
noise parameters shown in Fig. 9. This is because for the majority 
of epochs the size of the correlated noise is not constrained by the 
data and defaults to the prior distribution, for the adopted value of 
On = 0.1, indicating the priors are informative. 


However, for epoch 7, the data does inform the value of the size 
of the correlated noise for Mpc and the posterior is slightly shifted 
to higher values than the prior. This highlights the important balance 
between picking a reasonably tight prior to inform the size of the 
correlated noise but one that reflects reasonable uncertainty so the 
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Figure 10. Comparison of the source GEDR3 astrometry used as a prior 
in all of the models compared with the posterior on the astrometry from 
each of the models. All panels show a probability density. For the 2D plots, 
the inner and outer contours contain 68% and 95% of the probability mass, 
respectively. The histograms show the marginal probability densities for each 
parameter. We have omitted the GEDR3 prior and model posteriors for the 
source reference positions (X05, Yo,s) because we found good agreement 
between GEDR3 priors and all the model posteriors (see Fig. B1 for the full 
corner plot). 
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Figure 11. Posterior distribution on © for the deflection model with corre- 
lated noise Mpc and the deflection model with white noise Mpw. Re- 
ported Og values are the 50th posterior percentile and the 84th- 16th posterior 
percentile uncertainties. The prior used for @g in both models is shown in 


grey. 
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Figure 12. Sensitivity of the posterior constraint on Og (16th-84th posterior 
percentile) to changes in the priors in the deflection and correlated noise 
model Mpc. Left: Posterior constraint versus o7,, the standard deviation 
on the Gaussian prior on each of the correlated noise and white noise pa- 
rameters. In this case, the priors on the source and lens astrometry are fixed 
to the GEDR3 ze, and ne, respectively. Right: Posterior constraint versus 
altering the source or lens astrometric Gaussian prior covariance matrix by a 
multiplicative factor f , whilst keeping all other priors fixed. 


data can alter the value if there is constraining information. The 
adopted value of o;, = 0.1 mas in the analysis strikes a reasonable 
balance between our confidence in the size of the correlated noise 
at each epoch, whilst also giving sufficient flexibility for the data to 
further inform the size of the noise. Overall, the posterior constraint 
on Of is sensitive to our choice of a, = 0.1 mas and our prior 
knowledge on the correlated component of the noise in general. 


6 THE ASTROPHYSICS OF LAWD 37 


At a distance of 4.6 pc, LAWD 37 is the second nearest single white 
dwarf to the sun after van Maanen’s Star. Direct imaging of the region 
around LAWD 37 with HST shows no evidence of visible compan- 
ions down to detection limits (Schroeder et al. 2000). Comparing 
Hipparcos and GDR2 astrometry, Kervella et al. (2019) identified a 
proper motion anomaly for LAWD 37 which could be explained by 
a massive companion. However, upon comparison between Hippar- 
cos and the more precise GEDR3 astrometry for LAWD 37, this has 
largely been ruled out (Lindegren & Dravins 2021). Overall, at the 
present time it appears that LAWD 37 has no detectable companions. 
Being aclose by and bright target (V-band ~ 11.50), LAWD 37 has 
been the subject of numerous studies (e.g., Koester & Weidemann 
1982; Weidemann & Koester 1995; Dufour et al. 2005; Bergeron et al. 
2001; Holberg et al. 2008; Subasavage et al. 2009; Giammichele et al. 
2012; Sion et al. 2009; Subasavage et al. 2017; Coutu et al. 2019). 
As a result, a wealth of photometric and spectroscopic information 
has been gathered. LAWD 37 has a spectral type DQ indicating a 
helium dominated atmosphere and the presence of carbon lines in its 
spectrum. The presence of carbon in DQ white dwarf atmospheres is 
well explained by models of carbon settling and then being caught up 
by the helium convection zone bringing it to the surface (Bédard et al. 
2022). In line with all helium rich white dwarfs LAWD 37 is expected 
to have no or trace amounts of hydrogen which is represented by a 
thin hydrogen layer in models, gy = 10-!° (Dufour et al. 2005). 
Fitting DQ-type white dwarf atmospheric models (Blouin et al. 
2018; Blouin & Dufour 2019) to the broad-band photometry (V,R,I; 
Subasavage et al. 2017, J,H,Ks; 2MASS Skrutskie et al. 2006), and 
spectroscopy of LAWD 37 (see section 4.1 in Subasavage et al. 2017, 
for the fitting method), its inferred atmospheric parameters are, Tap = 
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Figure 13. Best fit atmospheric model (red) over the photometric and spec- 
troscopic data (black) of LAWD 37. Top: Photometric fit with VRI bands 
from Subasavage et al. (2017) and JHK, bands from 2MASS (Skrutskie et al. 
2006). Bottom: Optical spectrum of LAWD 37 from (Subasavage et al. 2017). 
The photometry and spectrospcopy of LAWD 37 were fit jointly using the 
method detailed in Section 4.1 of Subasavage et al. (2017), and we can see 
that the models and data are in agreement. 


7837*85K, log g = 7.983 + 0.016, and log[C/He] = -5.61 + 0.14 
(see Fig. 13). This fitting also permits the solid angle of LAWD 37 
to be determined which in turn allows the radius of LAWD 37 to 
be calculated directly using distance (parallax) information from 
GEDR3. The implied radius of LAWD 37 is R = 0.0127+0.0003Ro. 
Assuming that LAWD 37 follows the standard evolutionary theory for 
CO core white dwarfs (Bédard et al. 2020) with a thin hydrogen layer 
(qH = 107!°) its radius corresponds to a mass of 0.57 + 0.01Mo. 

The direct gravitational mass determination from the astromet- 
ric microlensing caused by LAWD 37 is completely independent of 
all atmospheric and evolutionary modeling assumptions. This grav- 
itational mass can therefore be used to test the theoretical models. 
Fig. 14 shows the position of LAWD 37 on the theoretical MRRs, 
obtained from the Montreal theoretical cooling tracks>, using the 
gravitational mass from the astrometric microlensing event. Fig. 14 
shows excellent agreement between the gravitational microlensing 
mass from model Mpc and predicted value from the evolutionary 
theory of CO core white dwarfs, 0.57 + 0.01Mo, and assuming a 
thin hydrogen layer. Also plotted in Fig. 14 is the MRR for CO core 
white dwarfs with a thick hydrogen layer (not expected for LAWD 
37). For comparison, also shown in Fig. 14, is the theoretical MRR 
for zero temperature white dwarfs with an iron (Fe) core (Hamada & 
Salpeter 1961), which is definitively ruled out by the microlensing 
mass. 

Coutu et al. (2019) observed that the main peak of the mass distri- 


5 Obtained from interpolation of the Montreal theoretical cooling 
tracks available at https://www.astro.umontreal.ca/~bergeron/ 
CoolingModels/. 
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bution for DQ white dwarfs in their sample was shifted by ~ 0.05Mo 
relative to that obtained for the DA and DB white dwarfs, suggesting 
that there could be a problem with the DQ models used in their mass 
determinations (the same models as those used here). This suspicion 
was supported by comparing their mass for Procyon B determined 
with the photometric/spectroscopic method (0.554Mo) with the dy- 
namical mass determination of 0.592Mo determined by Bond et al. 
(2015). An alternative explanation for the discrepancy between the 
mass distributions of DQ white dwarfs and DA/DB stars has recently 
been offered by (Bédard et al. 2022, see their Section 4.3) who ar- 
gued that although all DB stars likely experience carbon dredge-up, 
this phenomenon is more significant — and thus more easily de- 
tected — in lower-mass objects, thus explaining the relative deficit 
of high-mass DQ stars. The good agreement between our gravita- 
tional mass for LAWD 37 and the hybrid photometric/spectroscopic 
value reported here yields support to this interpretation, although the 
uncertainty on the microlensing mass remains too large to rule out 
any systematic problem with the current DQ model atmospheres. 

Fig. 14 shows the position of LAWD 37 on the theoretical 
Hertzsprung-Russell diagram (luminosity versus effective temper- 
ature) along with theoretical cooling tracks (assuming gy = 10719) 
from the Montreal database for a range of masses. The position of 
LAWD 37 in the Hertzsprung-Russell diagram is in agreement with 
the expected position of a white dwarf with the inferred microlens- 
ing mass of 0.56 + 0.08Mo. Also shown in Fig. 14 are isochrones 
for 1.0, 1.2, and 1.5 x10°years. By interpolation of the Montreal 
theoretical cooling tracks, the implied cooling age of LAWD 37 is 
1.15 + 0.04 x 10° years. 


7 DISCUSSION AND CONCLUSIONS 


We have analyzed HST follow-up data of a predicted astromet- 
ric microlensing event caused by the nearby white dwarf, LAWD 
37. Specifically, we used WFC3/UVIS HST astrometric data of the 
source in combination with Gaia astrometry (GEDR3) of the source 
and lens to infer a gravitational mass for LAWD 37 of 0.56+0.08Mo. 
We consider and fit four different models to the data. Models with 
and without the astrometric deflection term, and then each with and 
without a correlated noise component due to the lens PSF subtraction 
within an epoch (Mpc, Mpw.Mnc.Mnw). We find the model 
with the deflection term and correlated noise (Mpc) best explains 
the data according to the overall LOO score. 

The model that provides the next best explanation for the data 
according to the LOO score is the model without an astrometric 
deflection but with correlated noise (Myc). This model is able to 
provide an explanation for the deflection signal in the Y direction 
by increasing the size of the correlated noise above the prior ex- 
pectation and altering the source proper motion in the Y direction. 
Therefore, additional follow-up data on the source after the lensing 
event, which would further constrain the source proper motion, will 
likely definitively rule out this model. Myc, however, is unable to 
explain the asymmetric deflection signal in the X direction. This is 
most prominently seen in epoch 2 which has a large signal-to-noise 
deflection in the direction opposite to the source X proper motion 
direction. 

The model with the deflection and just white noise, Mpw, pro- 
vides a comparatively poor explanation of the data according to the 
LOO score. Mpw fails to explain the data in epoch 5 where there is 
a predicted high amount of correlated noise from the lens PSF sub- 
tractions. The failure in epoch 5 is clear, when compared to Myc, 
despite Mpw increasing the size of the white noise and epoch 5 
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Figure 14. Left: Comparison of the mass of LAWD 37 inferred from the astrometric lensing event with theoretical MRR relationships for white dwarfs. Blue 
lines show the MRR for CO core white dwarfs with a thin (gz = 107!°), and thick (gz = 107+) hydrogen layer, and effective temperature equal to that of 
LAWD 37. For comparison, also shown is the theoretical MRR for a zero temperature white dwarf with an iron (Fe) core (Hamada & Salpeter 1961). Right: 
Hertzsprung-Russell diagram for LAWD 37. Coloured lines show the Montreal cooling tracks for a selection of white dwarf masses. Dashed lines indicate 
isochrones. The position of LAWD 37 agrees with the inferred microlensing mass of 0.56 + 0.08Mo. The implied age of LAWD 37 is 1.15 + 0.04 x 10° years. 


having a large deflection signal. Consequently, because the corre- 
lated noise can mimic the deflection signal, Mp wy interprets some 
of the unmodeled correlated noise as additional deflection signal and 
infers a high mass for LAWD 37 of My, = 0.6610-08 Mo. Overall, 
we rule out this model, and its mass inference, due to its poor LOO 
score compared with Mpc and Myc. 

We performed checks on the sensitivity of our inferences in our 
chosen model, Mpc, to our prior modeling assumptions. We find 
that, for Mpc, our inference on My, is not sensitive to the lens 
GEDR3 prior. This means that collecting additional follow-up as- 
trometric data on the lens (LAWD 37) to improve its astrometric 
solution is unlikely to improve the inference on Mt. In contrast, we 
find that, for Mpc, the inference on Mz is sensitive to the GEDR3 
prior on the source astrometry. Therefore, we conclude that collecting 
additional astrometric data to further constrain the source astromet- 
ric solution is likely to improve the inference on My, by ~ 3%. For 
the noise parameters in Mpc, both correlated and white, we found 
our inference on My, is sensitive to their chosen prior distributions. 
We found that simultaneously tightening the prior distributions on 
all the noise parameters does not significantly improve the constraint 
on M,. 

Overall, we find that the model with the microlensing deflection 
signal and correlated noise due to the lens PSF subtraction (Mpc) 
provides the best explanation of the data. Mpc provides an inference 
of the mass of LAWD 37 of My, = 0.56 + 0.08Mo. A no-deflection 
and correlated noise model, My c, provides a worse but competitive 
explanation of the data with Mpc. However, Myc, has to increase 
the size of the correlated noise above our PSF simulation expectations 
to explain the data in many epochs. Moreover, Myc cannot explain 
the high signal-to-noise negative X direction offset in epoch 2 which 
is well explained by Mpc. We therefore conclude the astrometric 
deflection signal is present and we measure LAWD 37’s mass to be 
Mz = 0.56+0.08Mo. 

In conclusion, the gravitational mass for LAWD 37 obtained via 
astrometric microlensing in this work is in agreement with theoretical 
MRR and cooling tracks expected from the evolutionary theory of CO 
core white dwarfs. LAWD 37 is also predicted to cause many more 
astrometric microlensing events over the coming decades (Bramich 
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2018; Kliiter et al. 2018b), which may offer further opportunities to 
increase the precision of the gravitational mass measurement. This 
work provides the first ever semi-empirical test of the white dwarf 
MRR for a single, isolated white dwarf and lends support to current 
white dwarf evolutionary theory. This work also marks only the third 
time that the astrometric microlensing effect has been detected via 
the prediction channel. 


This analysis reveals how the followup of future predicted astro- 
metric microlensing events (Bramich 2018; Bramich & Nielsen 2018; 
Kliiter et al. 2018b; Nielsen & Bramich 2018b) could be improved. 
Generally, predicted microlensing events permit targeted and opti- 
mised followup campaigns (e.g., Sahu et al. 2017; Zurlo et al. 2018; 
McGill et al. 2019b), because the time of maximum signal can be 
predicted and hence measurements can be clustered around it. This 
seems like a sensible approach in the first instance. However, typical 
predictable microlensing events are caused by nearby bright lenses 
(Dominik & Sahu 2000). This means usually we will encounter a 
scenario similar to this analysis, i.e. where the lens PSF subtraction 
introduces significant correlated noise into the data. This correlated 
noise is usually worse at closer lens-source separations (see e.g., Ta- 
ble 2), when the deflection signal is largest. Moreover, this analysis 
shows that the correlated noise can mimic the astrometric deflection 
signal. Ultimately this meant that epoch 2 in this analysis, which is in 
the tails of the event, was critical for measuring the deflection signal, 
more so than the data at the predicted maximum. The utility of epoch 
2 was due to a high signal to noise offset in a direction that could not 
be explained by alteration of the source’s astrometric parameters. 


With the power of hindsight, the importance of taking observa- 
tions at epoch 2’s time could have been determined ahead of planning 
the HST observations. This is because all of the lensing systems pa- 
rameters (lens and source astrometry and lens mass estimate) were 
already known, and the size of the correlated noise introduced by 
the lens PSF subtraction could be roughly estimated ahead of time. 
However, at the time of closest approach, LAWD 37 was at the edge 
of the “sun avoidance zone”, which caused some increased noise, and 
also a shortened visibility period during an orbit. The small range of 
available ORIENTs at this configuration made it impossible to keep 
the diffraction spike too far from the source. This turned out to be the 


most important factor in elevating the noise, and there was no way to 
avoid this. This aspect of the observation planning is also extremely 
difficult to simulate in advance. A future line of research could work 
out if HST (or perhaps even the James Webb Space Telescope; Gard- 
ner et al. 2006) observation times which optimally constrain the lens 
mass and/or rule out purely correlated noise explanations of the data 
can be determined ahead of time. This could be tested on future 
predicted microlensing events. 
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APPENDIX A: LEAVE-ONE-OUT CROSS-VALIDATION 
APPROXIMATION 


In this Section we explain how the LOO PSIS approximation is 
applied to the models considered in this work. This section closely 
follows Vehtari et al. (2017) and Biirkner et al. (2020). The goal of the 
approximation, given that we have obtained S posterior samples from 
fitting Mr y to the full data set D, {0° om 1» We wish to approximate, 


p(DilD-i.Mrw) = f p(Dil. Mrn)p(O1D-i. Mr nad. 
(Al) 


Specifically, we want to use oop , to approximately evaluate Eq. 
(A1) instead of having to refit the model (which is expensive) with the 
ith data point left-out to obtain {6° eae This is achieved by using 
an importance sampling approximation and re-weighting (a }° 
accordingly. 

Importance sampling is a Monte Carlo method used to compute 
expectations. We would like to evaluate the expectation, 


By (A(0)] = - h() f(0)d0, (A2) 


where f is some hard-to-sample-from distribution. Instead of using 
samples from f we can use samples from an easier-to-sample-from 
proposal distribution g, i ee provided we know the the ratio 
between f and g which is r(@3) = f(8g)/8 (8g). The importance 
sampling approximation is then, 


DS_, r(S) a(S) 
Ere). 


The quality of this approximation is sensitive to the distribution of 
the importance weights, r(03). If the proposal distribution, g, is 
not representative of the target distribution f, then the importance 
weights become unstable. This instability comes from the importance 
weights being dominated by a few extreme values which leads to 
them having a large or infinite variance. Ultimately this leads to a 
poor importance sampling approximation (Vehtari et al. 2015). To 
mitigate this problem, 7(@%) are fitted to a Pareto distribution and 
the extreme values are removed and are replaced with draws from 
the fitted Pareto distribution, (6%). These smoothed weights, 7(03), 
are dropped in as a replacement for r(6%) in Eq. (A3). This is called 
Pareto Smoothed Importance Sampling (PSIS; Vehtari et al. 2015). 
In addition to stabilising the importance sample approximation, PSIS 
also provides a diagnostic on the quality of the importance sampling 
approximation. This diagnostic is the fitted shape parameter, k, of 
the Pareto distribution. k traces the number of finite moments of 
the importance weights distribution and therefore the quality of the 
PSIS approximation. Vehtari et al. (2015) find that k < 0.7 indicates 
PSIS will work well whereas a value of k > 0.7 indicates the PSIS 
approximation is likely to be poor and should not be used. 

For approximating the LOO score, our hard-to-sample-from target 
distribution is p(@|D_;, Mr yn). This is the posterior distribution 
obtained when fitting the model to the data set with the ith data 


Ef [h(@)] * (A3) 


point left-out. This is hard to sample from because it is expensive 
and we would have to refit the model, which we want to avoid. h is 
p(D;|0, D_;, Mr n ). Our easy to sample from proposal distribution 
is p(0|D, Mrn) which we readily have samples from - a pare 
The only remaining task is to compute the ratio between the target 
and proposal distributions. 


#0) = P(O\D-i, Mrn) . P(O|Mrn)p(D-i10, Mrn) (A4) 
P(A|D, Mrn) P(O|Mrn)p(D|@, Mr n) 
p(D-i|0, Mr n) 
poy eae eee AS 
p(DIO, Mr w) on 


Here we have used Bayes rule which states that the ratio of the 
posteriors is proportional to the ratio of the prior x likelihood, and the 
priors terms have cancelled. For all models the likelihood factorises 
over the different epochs (Eq. 10). If the left-out data point, Dj, is the 
jth data point in epoch é, all other epochs cancel from the likelihood 
ratio in Eq. (A5) and, 


P(D-i|9,Mrn) _ 1 
P(D|O,Mrn) — p(De,j|De,-j,8, Mrn) 


The ratio is proportional to the inverse of the likelihood of the left- 
out data point (Biirkner et al. 2020). In all models considered in 
this work, p(Dg,;|Dz,-;,@, M) is the product of two multivariate 
Gaussian distributions (in the X and Y directions) as described in Eq. 
(9). Using the results from Sundararajan & Keerthi (2001), Biirkner 
et al. (2020) and Eq. (9), this likelihood can be computed efficiently 
as, 


r(Q) « (A6) 


log p(Dg, ;|Dz,-;,9, Mrn) = — log 2nGe, ; 


1 


(x21; - ¥,;) 


(AT) 
. \2 
+ ((Yel; 7 ¥2,7) 
Here, [e] ; 18 the jth component of vector e. The conditional means, 


Xz, j and Ye, j» and conditional standard deviation @,; are given by 
Biirkner et al. (2020) as, 


[(2 (Xe - XF)| 


R2,j = [Xelj - f, (A8) 
oJ J Ny\- 

[ae] ,, 
Fas= (is [May We Felli (A9) 
éj = (Vel; - Wa : 

[ae], 

and, 
; 1 
a (A10) 


[ey], 


Here, we have dropped the dependence of zy ,X . and ¥ on @ 
for brevity. [¢] ;; denotes the j diagonal element of the matrix e. 

Using these weights with PSIS, we can then approximate Eq. (A1) 
for each left-out data point using the posterior samples from the full 
dataset, {0° Pe . For each left-out data point, we obtain the Pareto k 
diagnostic. If for any left-out data point k > 0.7, we do not use the 
PSIS approximation, and instead perform a full refit to evaluate Eq. 
(A1) exactly. Fig. Al shows the Pareto K diagnostic for each left-out 
data point for all models. For every model the approximation was 
reliable for all left-out data points apart from Ds53. For all models 
we therefore computed this LOO term exactly by refitting the model 
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Figure A1. Pareto k values in the PSIS-LOO approximation used for each 
model. Different colours indicate different models. Different sized makers are 
used so points on top of each other can be differentiated. Dashed horizontal 
line indicates the threshold value of Pareto k (0.7) used to judge if the im- 
portance sampling approximation is reliable. For all models, the importance 
sampling approximation fails for the data point with index 54 indicated as a 
vertical red line. To obtain the LOO score for data point 54 the model was 
refit. 


with Ds53 left-out. Overall, with PSIS, we obtained an approximate 
LOO score for each model with only one additional refit of the model 
compared to the 81 refits required to compute LOO exactly. 

To further check the safety of the PSIS approximation, we com- 
puted all LOO terms for the model, Mpc, exactly and compared 
them to the PSIS approximation. Fig. A2 shows the comparison of 
the PSIS approximation and the exactly computed LOO terms from 
Mpc. The PSIS approximation matches the exact LOO computation 
well for all left-out data points apart from Ds54. Encouragingly, D54 
is the same data point that was flagged by PSIS as the approximation 
being unreliable (Fig. Al). Consequently, the model was refitted and 
the exact LOO value was used for D54. The large negative value of 
LOO54, Myc indicates that Ds54 is an outlying data point which is 
not well predicted by Mpwy. The failure of PSIS approximation for 
Ds54 means that the removal of this data point changes the posterior 
by a large amount compared with the removal of all other data points. 
Fig A2 also shows that the PSIS approximation was ~ 80 times faster 
than the exact LOO computation. 


APPENDIX B: CONSISTENCY WITH Gaia 


This Section contains the full corner plots for both the source (Fig. 
B1) and lens (Fig. B2) prior and posterior distributions of their as- 
trometric parameters. For both the lens and source, and all models, 
it can be seen that the posteriors are in good agreement with the 
GEDR3 priors. 
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Figure A2. PSIS approximation versus exact computation of the LOO terms 
for Mpc. The PSIS is good agreement with the exact values. The dashed 
line is the one-to-one relationship. The data point that failed the PSIS approx- 
imation Ds3, is marked in red. The approximate CPU computation times for 


the exact and PSIS methods are shown as text. 
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Figure B1. GEDR3 priors versus posterior inferences on all source astrometric parameters for all the models considered. The plot is a full version of the densities 
shown in Fig. 10. 
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Figure B2. GEDR3 priors versus posterior inferences on all the lens (LAWD 37) astrometric parameters for the two deflection models which contain the 
astrometry as free parameters. For both these models, the HST data provides no further constraint from GEDR3. 
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